Virtual machine data backup

ABSTRACT

Disclosed is a method and system for efficiently backing up a virtual machine file. A virtual machine file is logically divided into a plurality of fixed-size blocks of similar size, for example, a number of 1 MB data blocks. An MD5 hash value is generated from the contents of each block. Each block is written to a file having a filename that includes a filesystem-compliant form (e.g., hexadecimal form) of the computed MD5 hash value. A backup device includes a directory hierarchy having a plurality of first-level directories corresponding to the first two bytes of the hash value, and a plurality of second-level directories corresponding to the next two bytes of the hash value. The blocks are uniquely stored in the directory corresponding to the byte value pairs of the hash. The present disclosure provides data integrity checking and reduces storage requirements for duplicative, redundant, or null data.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application claims the benefit of and priority to U.S.Provisional Application Ser. No. 61/168,315, filed on Apr. 10, 2009,entitled “VIRTUAL MACHINE DATA BACKUP”; U.S. Provisional ApplicationSer. No. 61/168,318, filed on Apr. 10, 2009, entitled “VIRTUAL MACHINEFILE-LEVEL RESTORATION”; and U.S. Provisional Application Ser. No.61/172,435, filed on Apr. 24, 2009, entitled “VIRTUAL MACHINE DATAREPLICATION”; the entirety of each are hereby incorporated by referenceherein for all purposes.

BACKGROUND

1. Technical Field

The present disclosure relates to computer data backup, and inparticular, to a system and method for performing block-level backups ofvirtual machine, wherein backed up data is stored in de-duplicated formin a hierarchical directory structure.

2. Background of Related Art

Continuing advances in storage technology allow vast amounts of digitaldata to be stored cheaply and efficiently. However, in the event of afailure or catastrophe, equally vast amounts of data can be lost.Therefore, data backup is a critical component of computer-basedsystems. As used herein, the term “backup” may refer to the act ofcreating copies of data, and may refer to the actual backed-up copy ofthe original data. The original data typically resides on a hard drive,or on an array of hard drives, but may also reside on other forms ofstorage media, such as solid state memory. Data backups are necessaryfor several reasons, including disaster recovery, restoring data lostdue to storage media failure, recovering accidentally deleted data, andrepairing corrupted data resulting from malfunctioning or malicioussoftware.

A virtual machine (VM) is a software abstraction of an underlyingphysical (i.e., hardware) machine which enables one or more instances ofan operating system, or even one or more operating systems, to runconcurrently on a physical host machine. Virtual machines have becomepopular with administrators of data centers, which can contain dozens,hundreds, or even thousands of physical machines. The use of virtualservers greatly simplifies the task of configuring and administeringservers in a large scale environment, because a virtual machine may bequickly placed into service without incurring the expense ofprovisioning a hardware machine at a data center. Virtualization ishighly scalable, enabling servers to be allocated or deallocated inresponse to changes in demand. Support and administration requirementsmay be reduced because virtual servers are readily monitored andaccessed using remote administration tools and diagnostic software.

In one aspect, a virtual server consists of three components. The firstcomponent is virtualization software configured to run on the hostmachine which performs the hardware abstraction, often referred to as ahypervisor. The second component is a data file which represents thefilesystem of the virtual machine, which typically contains the virtualmachine's operating system, applications, data files, etc. A virtualmachine data file may be a hard disk image file, such as, withoutlimitation, a Virtual Machine Disk Format (VMDK) format file. Thus, foreach virtual machine, a separate virtual machine file is required. Thethird component is the physical machine on which the virtualizationsoftware executes. A physical machine may include a processor,random-access memory, internal or external disk storage, andinput/output interfaces, such as network, storage, and desktopinterfaces (e.g., keyboard, pointing device, and graphic displayinterfaces.)

In installations having many machines, traditional methods of performingbackups may become burdensome and tend to be unduly resource-intensive,particularly in a virtual environment. In addition, backing up multipleinstances of essentially identical virtual servers (as typically foundin, e.g., “server farms” or in clustered systems”) often results inlarge amounts of redundant backup data, which becomes difficult tomanage and store. A backup system which performs virtual server backupswith increased efficiency and effectiveness would be a welcome advance.

SUMMARY

The disclosed method processes 1 MB fixed-length blocks of data of avirtual machine file. A unique identifier, such as without limitation,an MD5 hash, is created for this block data. The 1 MB of data can becompressed, or left uncompressed. The 1 MB of data is stored as a singlefile. The file name is the MD5 hash value of the 1 MB data block. Thehash of this file is saved to a separate index file for later use toretrieve, validate, and rebuild the backup data. The data blocks,whether in compressed or uncompressed form, are stored at a storagedestination, in a unique directory structure consisting of 256 firstlevel directories designated as 00-FF, each having 256 second leveldirectories designated as 00-FF within, comprising 65,536 directories intotal. The 1 MB compressed (or uncompressed) data files are stored inthe directory structure based on the first four bytes of the hash, e.g.,

-   -   “./00/22/T.002249a8a218ef8a4da87550f388942d.gz”.

The first four bytes of data for the file name are “0022”. The file isstored in directory “./00/22/”. The .gz extension indicates the file iscompressed.

Subsequent backups are performed having as a destination the samestorage location. Data blocks are generated using the above unique hash.A file query is made to the storage location to see if there is alreadya file existing with the same hash. If the file does not exist, thesource data is written into the directory hierarchy with the hash as thefile name and an index file is updated. If the file exists, then onlythe index file is updated for the current backup being run.

Over time the directory structure will accumulate data blocks from allbackups sent thereto. A separate index file is created for each backup,and is used to keep track of the blocks of data for, e.g., re-assemblingdata block of the original source during restoration.

The use of a hash also provides a self-checking mechanism which enablesself-validation of the data within the stored file. A routine isscheduled to run on an ad-hoc or periodic basis that reads the datawithin a stored file, and validates the data in the file to verify amatch to the hash file name. If the data does not match, the block isconsidered suspect, and is slated to be deleted. All associated backupsthat include this data block are flagged as “bad”. The index filecorresponding to backups so flagged may additionally or alternativelyinclude a “bad” flag.

In an embodiment, the data blocks (e.g., the 1 MB data blocks) may beevaluated to determine whether the data contained therein exhibits apredefined (“special”) data pattern. For example with limitation, aspecial data pattern may include a particular or repeating pattern,e.g., a data block consisting entirely of zero (00H) bytes. In thisinstance; a special hash is generated that represents the special datablock containing the particular data pattern. The special hash may behard-coded, defined in a database, and/or defined in a configurationfile. Since the contents of a special data block is predefined, it isonly necessary to record the fact that the data block is special. It isunnecessary to store the actual contents of a special block. Thus, foreach data block identified as special, the index file is updatedaccordingly and the backup proceeds. In this manner, resources areconserved since special blocks, e.g., null blocks, do not consume spaceon the storage device, do not use communication bandwidth during backupand restoration procedures, do not require as much computationalresources, and so forth. This provides an efficient way to skip special(e.g., null) data in a given backup set.

In another aspect, disclosed is a method for backing up computer datathat includes the steps of dividing a source data file into a pluralityof fixed size blocks, wherein each block is of equal blocksize. A uniqueblock identifier relating to the contents of a fixed size block isgenerated. On a destination storage device, a directory hierarchy isprovided having a plurality of first-level directories corresponding toa first portion of the unique block identifier, and a plurality ofsecond-level directories corresponding to a second portion of the uniqueblock identifier. A datablock file representative of the fixed sizeblock is stored in a corresponding second level directory.

In yet another aspect, disclosed is machine-readable media comprising aset of instructions configured to perform a method for backing upcomputer data that includes the steps of dividing a source data fileinto a plurality of fixed size blocks, wherein each block is of equalblocksize. A unique block identifier relating to the contents of a fixedsize block is generated. On a destination storage device, a directoryhierarchy is provided having a plurality of first-level directoriescorresponding to a first portion of the unique block identifier, and aplurality of second-level directories corresponding to a second portionof the unique block identifier. A datablock file representative of thefixed size block is stored in a corresponding second level directory.

Also disclosed is a system for performing data backup that includes aprocessor, a storage device operably coupled to the processor, and adata backup module. The data backup module including a set ofinstructions executable on the processor for performing a method of databackup. The method includes the steps of dividing a source data fileinto a plurality of fixed size blocks, wherein each block is of equalblocksize. A unique block identifier relating to the contents of a fixedsize block is generated. On a destination storage device, a directoryhierarchy is provided having a plurality of first-level directoriescorresponding to a first portion of the unique block identifier, and aplurality of second-level directories corresponding to a second portionof the unique block identifier. A datablock file representative of thefixed size block is stored in a corresponding second level directory.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other aspects, features, and advantages of the presentdisclosure will become more apparent in light of the following detaileddescription when taken in conjunction with the accompanying drawings inwhich:

FIG. 1 shows a block diagram of an embodiment of a virtual machinebackup system in accordance with the present disclosure;

FIG. 2 is a flowchart of an embodiment of a virtual machine backupmethod in accordance with the present disclosure;

FIG. 3 is a block diagram illustrating a directory hierarchy of anembodiment of a virtual machine backup in accordance with the presentdisclosure; and

FIG. 4 is a flow diagram of an embodiment of a virtual machine backup inaccordance with the present disclosure.

DETAILED DESCRIPTION

Particular embodiments of the present disclosure are describedhereinbelow with reference to the accompanying drawings; however, it isto be understood that the disclosed embodiments are merely examples ofthe disclosure, which may be embodied in various forms. Well-knownfunctions or constructions are not described in detail to avoidobscuring the present disclosure in unnecessary detail. Therefore,specific structural and functional details disclosed herein are not tobe interpreted as limiting, but merely as a basis for the claims and asa representative basis for teaching one skilled in the art to variouslyemploy the present disclosure in virtually any appropriately detailedstructure. In the discussion contained herein, the terms user interfaceelement and/or button are understood to be non-limiting, and includeother user interface elements such as, without limitation, a hyperlink,clickable image, and the like.

Additionally, the present invention may be described herein in terms offunctional block components, code listings, optional selections, pagedisplays, and various processing steps. It should be appreciated thatsuch functional blocks may be realized by any number of hardware and/orsoftware components configured to perform the specified functions. Forexample, the present invention may employ various integrated circuitcomponents, e.g., memory elements, processing elements, logic elements,look-up tables, and the like, which may carry out a variety of functionsunder the control of one or more microprocessors or other controldevices.

Similarly, the software elements of the present invention may beimplemented with any programming or scripting language such as C, C++,C#, Java, COBOL, assembler, PERL, Python, PHP, or the like, with thevarious algorithms being implemented with any combination of datastructures, objects, processes, routines or other programming elements.The object code created may be executed by any computer having anInternet Web Browser, on a variety of operating systems includingWindows, Macintosh, and/or Linux.

Further, it should be noted that the present invention may employ anynumber of conventional techniques for data transmission, signaling, dataprocessing, network control, and the like.

It should be appreciated that the particular implementations shown anddescribed herein are illustrative of the invention and its best mode andare not intended to otherwise limit the scope of the present inventionin any way. Examples are presented herein which may include sample dataitems (e.g., names, dates, etc.) which are intended as examples and arenot to be construed as limiting. Indeed, for the sake of brevity,conventional data networking, application development and otherfunctional aspects of the systems (and components of the individualoperating components of the systems) may not be described in detailherein. Furthermore, the connecting lines shown in the various figurescontained herein are intended to represent example functionalrelationships and/or physical or virtual couplings between the variouselements. It should be noted that many alternative or additionalfunctional relationships or physical or virtual connections may bepresent in a practical electronic data communications system.

As will be appreciated by one of ordinary skill in the art, the presentinvention may be embodied as a method, a data processing system, adevice for data processing, and/or a computer program product.Accordingly, the present invention may take the form of an entirelysoftware embodiment, an entirely hardware embodiment, or an embodimentcombining aspects of both software and hardware. Furthermore, thepresent invention may take the form of a computer program product on acomputer-readable storage medium having computer-readable program codemeans embodied in the storage medium. Any suitable computer-readablestorage medium may be utilized, including hard disks, CD-ROM, DVD-ROM,optical storage devices, magnetic storage devices, semiconductor storagedevices (e.g., USB thumb drives) and/or the like.

The present invention is described below with reference to blockdiagrams and flowchart illustrations of methods, apparatus (e.g.,systems), and computer program products according to various aspects ofthe invention. It will be understood that each functional block of theblock diagrams and the flowchart illustrations, and combinations offunctional blocks in the block diagrams and flowchart illustrations,respectively, can be implemented by computer program instructions. Thesecomputer program instructions may be loaded onto a general purposecomputer, special purpose computer, or other programmable dataprocessing apparatus to produce a machine, such that the instructionsthat execute on the computer or other programmable data processingapparatus create means for implementing the functions specified in theflowchart block or blocks.

These computer program instructions may also be stored in acomputer-readable memory that can direct a computer or otherprogrammable data processing apparatus to function in a particularmanner, such that the instructions stored in the computer-readablememory produce an article of manufacture including instruction meansthat implement the function specified in the flowchart block or blocks.The computer program instructions may also be loaded onto a computer orother programmable data processing apparatus to cause a series ofoperational steps to be performed on the computer or other programmableapparatus to produce a computer-implemented process such that theinstructions that execute on the computer or other programmableapparatus provide steps for implementing the functions specified in theflowchart block or blocks.

Accordingly, functional blocks of the block diagrams and flowchartillustrations support combinations of means for performing the specifiedfunctions, combinations of steps for performing the specified functions,and program instruction means for performing the specified functions. Itwill also be understood that each functional block of the block diagramsand flowchart illustrations, and combinations of functional blocks inthe block diagrams and flowchart illustrations, can be implemented byeither special purpose hardware-based computer systems that perform thespecified functions or steps, or suitable combinations of specialpurpose hardware and computer instructions.

One skilled in the art will also appreciate that, for security reasons,any databases, systems, or components of the present invention mayconsist of any combination of databases or components at a singlelocation or at multiple locations, wherein each database or systemincludes any of various suitable security features, such as firewalls,access codes, encryption, de-encryption, compression, decompression,and/or the like.

The scope of the invention should be determined by the appended claimsand their legal equivalents, rather than by the examples given herein.For example, the steps recited in any method claims may be executed inany order and are not limited to the order presented in the claims.Moreover, no element is essential to the practice of the inventionunless specifically described herein as “critical” or “essential.”

FIG. 1 illustrates a representative operating environment 100 for anexample embodiment of a virtual machine backup system 105 in accordancewith the present disclosure. Representative operating environment 100includes virtual machine backup system 105 which can be a personalcomputer (PC) or a server, which further includes at least one systembus 150 which couples system components, including at least oneprocessor 110; a system memory 115 which may include random-accessmemory (RAM); at least one storage device 130, such as withoutlimitation one or more hard disks, CD-ROMs or DVD-ROMs, or othernon-volatile storage devices, such as without limitation flash memorydevices; and a data network interface 140. System bus 150 may includeany type of data communication structure, including without limitation amemory bus or memory controller, a peripheral bus, a virtual bus, asoftware bus, and/or a local bus using any bus architecture such aswithout limitation PCI, USB or IEEE 1394 (Firewire). Data networkinterface 140 may be a wired network interface such as a 100Base-T FastEthernet interface, or a wireless network interface such as withoutlimitation a wireless network interface compliant with the IEEE 802.11(i.e., WiFi), GSM, or CDMA standard.

Virtual machine backup system 105 may be operated in a networkedenvironment via data network interface 140, wherein system 105 isconnected to one or more virtual machine hosts 160 by a data network180, such as a local area network or the Internet, for the transmissionand reception of data, such as without limitation backing up andrestoring virtual machine data files as will be further describedherein. Each of the one or more virtual machine hosts 160 may includeone or more virtual machines 170 operating therein, as will beappreciated by the skilled artisan.

Virtual machine backup system 105 includes a virtual machine backupmodule 120 that is configured to perform a method of virtual machinedata backup as described herein. In an embodiment, virtual machinebackup module 120 includes a set of programmable instructions adapted toexecute on processor 100 for performing the disclosed method of virtualmachine data backup. In particular, a method for backing up a virtualdisk file or virtual machine file, e.g., a VMDK file, is presentedherein. With reference to FIG. 2, a virtual machine file 420 slated forbackup may be stored on a storage device, such as without limitation,hard disk 410. While it is contemplated that hard disk 410 may beincluded within a virtual machine host, is it to be understood that avirtual machine file 420 may be stored on a hard disk array, such as astorage-area network (SAN), a redundant array of independent disks(RAID), network-attached storage (NAS) and/or on any storage medium nowor in the future known.

The virtual machine file 420 is logically divided into a number offixed-length blocks 430 of like size. In one embodiment, a blocksize of1 MB is used, however, it is to be understood that a blocksize of lessthan 1 MB, or greater than 1 MB, may be used within the scope of thedisclosed method. In one aspect, the blocksize is determined at least inpart by a correlation between performance and blocksize. Otherparameters affecting blocksize may include, without limitation, a databus speed, a data bus width, a virtual machine file size, a processorspeed, a storage device bandwidth, and a network throughput. If avirtual machine does not precisely equal a multiple of a chosen fixedblocksize, the remainder may be padded with e.g., zeros, nulls, or anyother fill pattern, to achieve a set of equal-sized blocks.

An individual backup data file 445 is created from each fixed-lengthblock 430 of the virtual machine file 420. In an embodiment, individualbackup data file 445 may be given a temporary filename, and/or stored ina temporary location, e.g., /var/tmp/block000001.dat. A hash isgenerated according to the contents of each individual backup data file.In an embodiment, a 4,096 bit MD5 hash is used to create the hash valuefrom the contents thereof. The resultant hash value is stored in anindex file corresponding to the current backup session which store forlater use during, e.g., data restoration. The index file may include,without limitation, a list of data blocks comprising the backup set,hash values corresponding thereto, a date and time of backup, a sourcelocation, and a destination location. A collection of hash valuesrepresentative of a backup of virtual machine file, and data associatedtherewith, may be stored in an index file 455. Such a collection,together with the individual backup data files comprising the backed-upvirtual machine file 420 is known as a “backup set.”

Additionally or alternatively, the data block 430 may be compressedduring a compression step 432 using any suitable manner of datacompression, including without limitation, LZW, zip, gzip, rar, and/orbzip. Preferably, lossless data compression is used however in certainembodiments lossy data compression may advantageously be used.

The hash value may be regarded as a unique block identifier, or a uniqueidentifier of a backup data file 455. A non-temporary (“archival”)filename of the backup data file may be generated, at least in part,from the hash value, as illustrated in step 434. For example, thefilename of a backup data file 455 may be created by appending ahexadecimal representation of the hash value to a file prefix and/or toan appropriate file extension. Each backup data file 455 comprising thevirtual machine file therefore has a unique filename based upon the hashvalue.

As seen in FIG. 3, a hierarchical directory structure 300 is provided ona backup storage device, e.g., storage device 130, for storing thebackup data files. The disclosed structure has at a first level thereofa plurality of directories 320 et seq. (e.g., folders). Each first leveldirectory contains therein a plurality of second level directories 330.In an embodiment, the hierarchy includes 256 first level directories,wherein each first level directory includes 256 second leveldirectories, for a total number of 65,536 directories. The first leveland second level directories may be named in accordance with a sixteenbit hexadecimal value, e.g., 00-FF. Thus, for example, a plurality offirst level directories may be named in accordance with the series ./00,./01, ./02 . . . ./FF while a second level of directories may be named./00/01, ./00/02/ . . . ./00/FF. Other directory mapping schemes areenvisioned within the scope of the present disclosure, such as withoutlimitation, a directory hierarchy having fewer than two levels, adirectory hierarchy having greater than two levels, a directoryhierarchy having a directory naming convention that includes fewer thana sixteen bit hexadecimal value, a directory hierarchy having adirectory naming convention that includes greater than a sixteen bithexadecimal value, and/or a directory hierarchy having a directorynaming convention that includes an alternative naming encoding, such asoctal, ASCII85, and the like.

With reference now to FIG. 2, each backup data file may advantageouslybe stored (e.g., copied or moved) in the directory hierarchy inaccordance with the first 4 bytes of the hash value thereof. By way ofexample only, assume a backup data file representing a 1 MB block of avirtual machine file has an MD5 hash value of:

-   -   010249a8a218ef8a4da87550f388942d

The backup data file may be compressed with gzip and renamed inaccordance with the present disclosure, e.g.:

-   -   T.010249a8a218ef8a4da87550f388942d.gz

Taking the first four bytes of the hash value, two at a time, thedestination directory is identified as:

-   -   ./01/02

The backup data file is stored in the identified destination directory,hence the full pathname of the backup data file may be expressed as:

-   -   ./01/02T/.010249a8a218ef8a4da87550f388942d.gz

In this manner, each unique data block 430 corresponds to a backup datafile 445 uniquely stored within the directory hierarchy 300. The presentdisclosure also contemplates a filename/directory mapping which usesgreater than, less than, and/or other than the first four bytes of thehash value. During execution of a subsequent backup process, a filenameis generated as previously described. A file query is made to thestorage device, e.g., it is determined whether a backup data file havingthe same filename exists and if so, it is presumed the block isunchanged from the prior backup, and the index file corresponding to thesubsequent backup is updated to include the existing (e.g., unchanged)block. If, however, it is determined whether a backup data file havingthe same filename does not exist, it is presumed the block changed andthe newly-created backup data file is stored within the directoryhierarchy as previously described herein, and a corresponding entry iswritten to the index file. In this manner, by ensuring that duplicatecopies of data block are stored only once, increasing efficiency, e.g.,increased execution speed and reduced resource usage, are provided by abackup performed in accordance with the present disclosure.

Advantageously, the disclosed method provides data integrity validation,which may identify data corruption. During data integrity validation, abackup data block is read (and, if required, expanded to an uncompressedform) whereupon a hash value is generated from the stored contentstherein and compared to the hash value included in the filename. If thecomputed hash value corresponds to the filename hash value, it ispresumed the archived data is correct and intact. If, however, adiscrepancy is identified between the expected (filename) hash value andthe actual (computed) hash value, the data block is flagged as bad. Anybackup sets that include a bad backup data file may also be flagged asbad. Bad backup data files and/or backup sets may be slated forimmediate deletion, or may be scheduled for deletion at a future time.Integrity validation may be performed on a periodic or routine basis, ormay be performed prior to data restoration from a backup set.

In another aspect, a virtual machine data block may be evaluated todetermine whether it contains all zero bytes, all one bytes, containsnull data, or exhibits some other relatively simple data pattern whichobviates the need to physically store such data block. In this event, aunique “null” hash is generated and included within the index file,together with any associated data, without writing a backup data file tothe storage device.

Turning to FIG. 4, an embodiment of the disclosed method begins in thestep 205 and in the step 210, the first datablock 430 of a virtualmachine file 420 is read. In the step 215, the datablock is evaluated todetermine whether it exhibits a special data pattern, e.g., whether thedatablock consists entirely of zeros (00H). If the datablock exhibits aspecial data pattern, in the step 225 a corresponding special uniqueblock identifier is assigned to the datablock. In an embodiment, thespecial unique block identifier is a 32-digit hexadecimal numberconsisting of all zeros. The process continues with the step 245, asdiscussed below.

If, however in the step 215 it is determined the datablock does notexhibit a special data pattern, then in the step 220 a hash function isperformed on the contents of the datablock to generate a unique blockidentifier corresponding to the datablock. In an embodiment, the hashfunction is an MD5 hash function. The step 230 is performed next whereinthe destination directory 330 et seq. within the directory hierarchy 300is determined. The destination directory 330 et seq. is based at leastin part upon the value of specific digits within the unique blockidentifier. In an embodiment, the first two bytes of the unique blockidentifier (e.g., the two most significant digits of the hash) representthe first level directory 320 et seq. within the directory hierarchy.The next two bytes of the unique block identifier (e.g., the next twomost significant digits of the hash) represent the second leveldirectory 330 et seq. within the directory hierarchy. The pathname ofthe datablock file as stored within the directory hierarchy may beformed by concatenating a pathname root string (e.g., “/mnt/bck/”), thefirst two significant hexadecimal digits of the unique identifier (e.g.,“01”), a directory delimiter character (e.g., “/”), the next twosignificant hexadecimal digits of the unique identifier (e.g., “02”), adirectory delimiter character (e.g., “/”), and the file name of thedatablock (e.g., 010249a8a218ef8a4da87550f388942d.dat.

In the step 235, the datablock 430 is optionally compressed to reducethe amount of storage resources that will be required to store thedatablock file. In embodiments, the manner of compression may behard-coded, defined in a database (e.g., a registry database), and/ordefined in a configuration file (e.g., via a “preferences” or “options”setting provided by a user interface or by hand-editing a configurationfile) in accordance with user requirements. Any suitable manner of datacompression may be employed, including without limitation, LZW, zip,gzip, rar, and/or bzip. Additionally or alternatively, the datablock maybe cryptographically encoded using any suitable cryptosystem, includingwithout limitation a symmetric-key cryptosystem (e.g., DES, Triple-DES,AES, and the like) or a public-key cryptosystem (e.g., RSA,Diffie-Hellman, elliptic curve techniques, and the like.)

In the step 240, the datablock 430 (which may be in its original form,compressed, encrypted, and/or combinations thereof) is written to acorresponding datablock file 445 in the destination directory 335 etseq. In the step 245, an index file entry 446 corresponding to thedatablock 430 in an index file 445 is created. The index file 445 maycontain entries relating solely to the current backup set, or maycontain entries relating to a plurality of backup sets. In anembodiment, the index file 445 includes a database. For eachcorresponding datablock 430 identified within the index file 445, anindex file entry 446 may include, without limitation, a unique blockidentifier value, a timestamp of the backup set, a timestamp relating tothe backup time of the individual datablock, a datablock sourcelocation, a datablock destination location. In embodiments, thedatablock source location may include an identifier relating to thevirtual machine from which the backup set was generated, a virtualmachine host identifier, a machine name, a node name, a network address(e.g., an internet protocol address), a software identifier, a hardwareidentifier, an encryption key, and the like. In embodiments, thedatablock destination location may include an identifier relating to thestorage device on which the datablock file 445 is stored, a destinationdirectory in which the datablock file 445 is stored, a pathname of thedatablock file, a filename of the datablock file, a unique blockidentifier value, and the like.

In the step 250, a test is performed whereby it is determined whetherall datablocks 430 of the virtual machine file 420 have been processed.If not, the method 200 iterates to the step 210 wherein the nextdatablock 430 of the virtual machine file 420 is read, and processingproceeds as described hereinabove.

The present disclosure is also directed to a computer-based apparatusand a computing system configured to perform a method of data backup asdescribed herein. Also disclosed is computer-readable media comprising aset of instructions of performing a method of data backup as describedherein.

While several embodiments of the disclosure have been shown in thedrawings and/or discussed herein, it is not intended that the disclosurebe limited thereto, as it is intended that the disclosure be as broad inscope as the art will allow and that the specification be read likewise.Therefore, the above description should not be construed as limiting,but merely as exemplifications of particular embodiments. The claims canencompass embodiments in hardware, software, or a combination thereof.Those skilled in the art will envision other modifications within thescope and spirit of the claims appended hereto.

1. A method for backing up computer data, comprising the steps of:dividing a source data file into a plurality of fixed size blocks,wherein each block is of equal blocksize; generating a unique blockidentifier relating to the contents of a fixed size block; on adestination storage device, providing a directory hierarchy having aplurality of first-level directories corresponding to a first portion ofthe unique block identifier and a plurality of second-level directoriescorresponding to a second portion of the unique block identifier; andstoring a datablock file representative of the fixed size block in acorresponding second level directory.
 2. The method in accordance withclaim 1, further comprising: providing an index file corresponding tothe source data file; and storing the unique block identifier in theindex file.
 3. The method in accordance with claim 1, wherein the fixedblock size is in a range of about 256 KB to about 8 MB.
 4. The method inaccordance with claim 1, further comprising the step of compressing thedatablock file representative of the fixed size block.
 5. The method inaccordance with claim 1, further comprising the step of encrypting thedatablock file representative of the fixed size block.
 6. The method inaccordance with claim 1, wherein the unique block identifier is a hashis generated in accordance with an MD5 algorithm
 7. The method inaccordance with claim 1, further comprising the step of naming thedatablock file representative of a fixed size block in accordance withthe unique block identifier.
 8. The method in accordance with claim 1,further comprising the steps of: computing a unique block identifier ofa stored datablock file; retrieving a stored unique block identifiercorresponding to the stored datablock; determining a property of thestored datablock by comparing the computed unique block identifier tothe stored unique block identifier.
 9. The method in accordance withclaim 1, further comprising: determining whether the fixed size blockconsists of a simple data pattern.
 10. The method in accordance withclaim 9, wherein the simple data pattern is selected from a groupconsisting of all zeros, all ones, and all nulls.
 11. A system forperforming data backup, comprising: a processor; a storage deviceoperably coupled to the processor; and a data backup module including aset of instructions executable on the processor for performing a methodof data backup comprising the steps of: dividing a source data file intoa plurality of fixed size blocks, wherein each block is of equalblocksize; generating a unique block identifier relating to the contentsof a fixed size block; on the storage device, providing a directoryhierarchy having a plurality of first-level directories corresponding toa first portion of the unique block identifier and a plurality ofsecond-level directories corresponding to a second portion of the uniqueblock identifier; and storing a datablock file representative of thefixed size block in a corresponding second level directory.
 12. Thesystem in accordance with claim 11, wherein the method of data backupfurther comprises the steps of: providing an index file corresponding tothe source data file; and storing the unique block identifier in theindex file.
 13. The system in accordance with claim 11, wherein thefixed block size is in a range of about 256 KB to about 8 MB.
 14. Thesystem in accordance with claim 11, wherein the method of data backupfurther comprises the step of compressing the datablock filerepresentative of the fixed size block.
 15. The system in accordancewith claim 11, wherein the method of data backup further comprises thestep of encrypting the datablock file representative of the fixed sizeblock.
 16. The system in accordance with claim 11, wherein the uniqueblock identifier is a hash is generated in accordance with an MD5algorithm
 17. The system in accordance with claim 11, wherein the methodof data backup further comprises the step of naming the datablock filerepresentative of a fixed size block in accordance with the unique blockidentifier.
 18. The system in accordance with claim 11, wherein themethod of data backup further comprises the steps of: computing a uniqueblock identifier of a stored datablock file; retrieving a stored uniqueblock identifier corresponding to the stored datablock; determining aproperty of the stored datablock by comparing the computed unique blockidentifier to the stored unique block identifier.
 19. The system inaccordance with claim 11, wherein the method of data backup furthercomprises the step of determining whether the fixed size block consistsof a simple data pattern.
 20. The system in accordance with claim 19,wherein the simple data pattern is selected from a group consisting ofall zeros, all ones, and all nulls.
 21. Machine-readable mediacomprising a set of instructions configured to perform the method ofdata backup in accordance with claims 1 though 10.